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Abstract 

To  understand  the  behavior  of  a  turbofan  engine,  one  first 
needs  to  deal  with  the  variety  of  data  acquisition  contexts. 
Each  time  a  set  of  measurements  is  acquired,  and  such  set 
may  account  for  tens  of  parameters,  the  aircraft  evolves  in  a 
specific  flight  mode.  A  diagnostic  of  the  engine  behavior 
models  the  observations  and  tests  if  anything  appears  as 
expected.  A  model  of  the  engine  measurement  vector  may 
be  very  complex  to  produce  and  even  more  to  deploy  on 
board.  The  idea  is  to  solve  the  problem  locally  on  recurrent 
phases  on  which  each  single  problem  may  be  easier  to 
answer.  Civil  flight  missions  are  straightforward  to 
decompose  as  they  are  very  recurrent.  It  is  more  difficult 
with  military  missions  and  bench  tests.  Once  a  set  of  phases 
is  defined,  local  regression  models  may  be  built.  To  solve 
nonlinearities  a  selection  of  computed  variables  is  a  good 
approach  but  such  algorithm  needs  the  definition  of  a  stable 
set  of  recurrent  phases  and  a  very  complex  learning 
procedure  that  uses  a  huge  amount  of  memory  to  deal  with 
the  high  dimensionality  of  the  problem.  Such  algorithm  is 
very  powerful  but  is  not  adapted  for  an  online  use.  Our  new 
solution  does  not  require  the  a  priori  knowledge  of  recurrent 
phases;  it  learns  recurrent  contexts  on  the  fly  and  adapts  a 
small  local  regression  model  on  a  selected  optimal  subspace. 
The  application  of  this  algorithm  seems  to  be  efficient  on 
long  term  flight  trend  monitoring  and  on  real  time  test  bench 
measurements.  It  solves  the  memory  problem  for  calibration 
by  an  iterative  autoadaptive  procedure  and  suppress  the 
need  of  preliminary  computations  of  specific  parameter  as  it 
auto-adapts  itself  with  piecewise  linear  models. 

1.  Introduction 

Turbofan  engine  abnormality  diagnosis  uses  three  steps: 
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reduction  of  dependencies  from  the  flight  context  (1), 
representation  of  the  measurement  in  an  adequate  metric 
space  suitable  for  classification  and  statistic  testing  (2)  and 
finally  identification  of  abnormal  behavior  (3)  as 
represented  on  Figure  1  (next  page).  This  work  essentially 
deals  with  the  first  normalization  step. 

The  current  text  focus  on  identification  of  flight  phases  to 
extract  subsamples  of  temporal  observations  where  the 
turbofan  gross  behavior  may  be  explained  by  simple 
(eventually  linear)  models.  This  example  is  easy  to 
visualize,  but  we  also  use  the  same  algorithm  on  different 
applications.  At  component  level  we  monitor  the  start 
system  (Flandrois,  Lacaille,  Masse,  &  Ausloos,  2009; 
Lacaille,  2009),  the  fuel  system  and  other  turbofan 
components.  Even  to  monitor  bench  test  cells  we  look  at 
vibration  monitoring  according  to  load  parameters  and  lots 
of  different  other  configurations  (Lacaille  &  Gerez,  2011, 
2012). 

The  first  step  of  the  algorithm  is  to  get  rid  of  acquisition 
context.  This  is  mandatory  because  we  need  to  compare 
similar  events,  observations  corresponding  to  one  unique 
and  standard  context.  For  this  purpose  we  use  a 
normalization  algorithm  (Figure  1,  step  1).  The  classical 
method  is  to  use  a  model  of  the  engine  observation 
measurements  named  endogenous  parameters  according  to 
the  flight  context  also  referred  as  exogenous  parameters  (see 
Table  1  for  a  list  of  parameter  examples).  The  residual 
between  real  endogenous  parameters  and  the  model  results 
is  then  used  as  inputs  to  a  scoring  algorithm  (Figure  1,  step 
2)  which  is  essentially  a  statistical  test  that  measures  the 
likelihood  of  the  current  observation.  The  main  problem  is 
the  construction  of  such  residual.  As  the  engine  behavior  is 
definitely  nonlinear  according  to  the  flight  measurements  a 
suggestion  is  to  cut  the  flight  in  recurrent  phases:  taxi, 
takeoff,  climb,  cruise,  descent,  etc.  and  models  the  behavior 
locally  on  those  phases.  However  as  such  decomposition 
seems  easy  to  build  on  civil  mission  it  is  a  real  challenge  on 
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Figure  1  -  The  mains  steps  of  any  diagnostic  application  for  aircraft  engine  monitoring. 


military  missions  which  all  are  different  as  well  as  for 
helicopter  missions,  business  jets  and  event  test  bench  tests. 

Table  1  -  Example  of  context  information  and  endogenous 
measurements.  Context  parameters  are  mainly  commands 
that  describe  the  current  engine  use  but  also  aircraft  attitude. 
Endogenous  measurements  represent  the  observation  we  are 
really  interested  in  to  describe  the  engine  behavior  if  the 
context  was  always  the  same  during  acquisition 


Name 

Description 

Index  information 

ACJD 

Aircraft  ID 

ESN 

Engine  Serial  Number 

LU 

Q 

J 

LL 

Flight  Date 

Context  information 

TAT 

External  temperature 

ALT 

Altitude 

AIE 

Anti  Ice  Engine 

AIW 

Anti  Ice  Wings 

BLD 

Bleed  valve  position 

ISOV 

ECS  Isolation  Valve  Position 

VBV 

Variable  Bleed  Valve  Position 

VSV 

Variable  Stator  Vane  Position 

HPTACC 

High  Pressure  Turbine  Active  Clearance  Control 

LPT ACC 

Low  Pressure  Turbine  Active  Clearance  Control 

RACC 

Rotor  Active  Clearance  Control 

ECS 

Environmental  Control  System 

TLA 

Thrust  Lever  Angle 

N1 

Fan  Speed 

XM 

Mach  Number 

Endogenous  measurements 

N2 

Core  Speed 

FF 

Fuel  Flow 

PS3 

Static  pressure  after  compression 

T3 

Temperature  after  compression 

EGT 

Exhaust  Gas  Temperature 

Our  first  approaches  uses  manual  extraction  of  flight  phases 
for  civil  engines  and  a  LASSO  algorithm  for  the  selection  of 
pertinent  analytical  combinations  of  parameters  to  build  the 
regression  model  and  then  a  autoadaptive  clustering  method 
that  uses  a  self-organizing  map  (SOM)  to  identify  the 
different  faults  or  behavior  differences  (Figure  1 ,  step  three 
“identification”).  This  work  was  presented  in  previous  work 
(Come,  Cottrell,  Verleysen,  &  Lacaille,  2010,  2011;  Cottrell 
et  al.,  2009;  Lacaille  &  Come,  2011). 

Even  when  flight  mode  identification  of  recurrent  phases  is 
clear,  the  normalization  model  that  currently  uses  a  LASSO 
regression  algorithm  needs  a  very  huge  amount  of  memory. 
The  LASSO  algorithm  needs  a  matrix  of  the  parameter 
measurements  in  memory:  as  an  example  the  data  for  one 
engine  from  a  set  of  500  medium  range  flights  with  100 
parameters  weight  around  1.5  Gb  when  acquired  at  1  Hz. 
Even  this  volume  of  data  is  not  easily  manageable  with 
classical  tools  and  standard  algebraic  operations  such  as 
singular  values  decomposition  (SVD)  which  is  the  base  tool 
in  linear  compression.  Hence  it  is  only  possible  to  calibrate 
this  model  on  ground  on  a  subsample  of  data  we  may 
download  from  a  small  subset  of  aircrafts  which  owners  (the 
airlines  or  military)  let  us  have  access  to  their  digital  flight 
data  recorders  (DFDR,  the  black  boxes).  The  resulting 
model  transferred  on  each  engine  is  finally  a  general 
approximation.  It  misses  the  specificity  of  each  engine  or 
event  the  particular  way  each  company  and  pilot  operates  its 
aircrafts. 

2.  Statistic  Mixture  Model 

To  solve  our  normalization  problem  iteratively  with  not  too 
much  memory  resources  involved  we  used  a  mixture  of 
probabilistic  principal  component  analysis  (MPPCA)  model. 
Such  model  is  an  extension  of  the  classical  PCA  which  goal 
is  to  extract  a  reduced  number  of  dimensions  on  which  the 
data  may  be  explained.  The  reduction  of  dimension  enables 
the  computation  of  meaningful  distances1  and  allows  the 


1  Distances  are  needed  to  compute  a  score  based  on  the  likelihood  of  the 
difference  between  observation  and  model  estimation.  In  high  dimension, 
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computation  of  scores.  However  if  the  general  behavior  of 
observations  is  not  linear  a  classical  PCA  algorithm  will 
fail.  A  nice  solution  is  to  make  the  hypothesis  that  in  each 
flight  mode,  the  local  behavior  of  the  engine  may  have  a 
linear  representation.  The  MPPCA  algorithm  will  use  EM 
(expectation/maximization)  optimization  scheme  to  identify 
the  clusters  and  to  build  the  local  projections. 

We  consider  that  we  dispose  of  a  datastream  D  = 
{xlf ... xN }  E  Rp ,  where  the  x  are  independent  realizations  of 
a  random  vector  X  E  Rp  .  In  addition,  [zlt  ...,zN]  are 
assumed  to  be  independent  realizations  of  an  unobserved 
(latent)  random  variable  Z  with  values  in  {1  (there 

exists  K  different  modes.)  The  MPPCA  model  assumes  that 
the  observed  random  vector  X  E  Rp  is,  conditionally  to  Z, 
linked  to  a  d-dimensional  latent  random  vector  Y  E  Rd 
through  a  linear  transformation  of  the  form: 

X\z=k  =  UkY  +  M/c  +  e  (1) 

where  Uk  is  the  p  x  d  orthogonal  transformation  matrix, 
\ik  E  Rp  is  the  mean  vector  of  the  k- th  factor  analyzer  and 
e  E  Rp  is  a  noise  term.  The  dimension  d  of  the  latent  vector 
is  such  that  d  <p  and  assumed  to  be  known  (Figure  2 
below  shows  an  illustration  of  K=2  d=2D  subspaces  in  a 
p=3D  domain). 


Figure  2  -  Illustration  of  two  Gaussian  d- 2  subspaces  in  a 
main  p- 3  dimension  space. 

Moreover,  e  is  assumed  to  be,  conditionally  to  Z,  a  centered 
Gaussian  noise  term  with  a  diagonal  covariance  matrix  bkIp : 

e\z=k=W(0,bkIp).  (2) 

Besides,  the  unobserved  latent  factor  Y  E  Rd  is  assumed  to 
be,  conditionally  to  Z,  distributed  according  to  a  Gaussian 
density  function  such  as: 


distances  lose  their  signification  which  is  also  known  as  the  curse  of  high 
dimensions.  We  try  to  limit  ourselves  to  a  selected  dimension  smaller  than 
5. 


Y\z=k  =  M(0,Idy  (3) 

This  implies  that  the  conditional  distribution  of  X  is  also 
Gaussian: 


X\Y,z=k  ~  N (UkY  +  pk>  bkIp )  (4) 

and  its  marginal  distribution  is  therefore  a  mixture  of 
Gaussians: 


K 

/(x)  =  ^  nkcp(x-,  nk,  Ik)  (5) 

k= 1 

where  nk  is  the  mixture  proportion  for  the  k- th  component, 
0  is  the  multivariate  Gaussian  density  function 


1 

0(x;  j uk,  Zfe)  = - - - j  x 

(2n)2  det(Zk)2  (6) 

CXP  'EfcTx-Jtfc)) 

and2:k  =  UkUk  +  bkIp. 

In  order  to  facilitate  the  description  of  our  online  inference 
procedure,  let  us  slightly  re -parameterize  the  above  model. 
Fet  us  first  introduce  the  orthonormal  transformation  matrix 
Qk  which  is  such  that  its  j-th  column  qkJ-  =  ific;/||u/c;|| 
where  ukJ-  is  the  corresponding  column  of  Uk  .  If  the 
transformation  matrix  Qk  is  orthonormal,  it  is  then 
necessary  to  report  the  variance  of  the  latent  factor  within 
the  distribution  of  the  latent  factor. 


We  therefore  now  assume  that  Y\z=k  =  X (0,  Afc)  where 
Ak  =  diag(Afcl, ...  Afcd).  The  marginal  distribution  of  X  is 
then  still  a  mixture  of  Gaussians  but  with  covariance 
matrices  T.k  =  QkAkQk  +  bkIp  .  By  denoting  by  Qk  = 
[Qk,  Rk\  the  p  x  p  matrix  made  of  Qk  and  an  orthonormal 
complementary  Rk ,  the  projected  covariance  matrix  QkY,kQk 
has  the  following  form: 


0 

akd 


o 


o 


\ 


d 


( P~d ) 


where  akJ-  =  AkJ-  +  bk  and  akJ-  >  bk  ,  for  k  =  1, ...  K  and 
j  =  1  With  these  notations,  the  mixture  of  PCA  model 
is  fully  parameterized  by  the  set  of  parameters  = 
with  each  0k  =  [nk,  \ik,  Qk,  (akJ);=1  K). 


It  can  be  shown  (Bouveyron,  Girard,  &  Schmid,  2007)  that 
the  MPPCA  model  is  identifiable  and  its  inference  can  be 
done  using  a  simple  EM  algorithm.  In  particular,  the  update 
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formula  in  the  M  step  for  the  orientation  matrices  Qk  and 
the  variance  parameters  akj  and  bk  are  as  follows: 

•  the  d  columns  of  Qk  are  estimated  by  the 
eigenvectors  associated  with  the  d  largest 
eigenvalues  of  the  empirical  covariance  matrix  Sk 
of  the  k- th  group, 

•  the  empirical  covariance  matrix  of  the  k-th  group  is 
Sk  =  ££i=i...n  tifcO;  -  Hfc)0;  -  l*fc)’  where  at 
each  current  step  tik  =  P(Z  =  k\xt;  9). 

•  akj  is  estimated  by  the  j-th  largest  eigenvalues  of 

Sk, 

•  bk  is  estimated  by  the  residual  variance  bk  = 


each  data  sample  is  being  discarded  after  being  processed 
(Bellas,  Bouveyron,  Cottrell,  &  Lacaille,  2013). 

Let  us  assume  that  we  initially  have  observed  a  dataset  of  n0 
data  samples  (xx  ...xno)  E  Rp  and  that  we  have  obtained  an 
initial  estimate  9  ^  of  these  data.  In  practice,  we  obtain  an 
initial  estimation  of  the  model  parameters  with  a  standard 
MPPCA  iterative  EM  algorithm  on  this  initial  dataset.  Let 
us  set  n  =  n0  and  consider  the  arrival  of  a  new  data  sample 
xn+i  e  RP. 

The  objective  is  therefore  to  update  the  estimate  of  6  from 
the  sole  knowledge  of  9 ^  and  xn+1 .  This  is  a  two-step 
procedure  which  involves  an  expectation  step  (E-step)  and  a 
maximization  step  (M-step). 

3.1.  The  E-step 


In  addition,  these  update  formulas  illustrate  the  strong  link 
between  MPPCA  and  the  principal  component  analysis 
(PCA)  method,  since  they  both  consider  eigenvectors 
corresponding  to  the  largest  eigenvalues  of  the  covariance 
matrix  eigen  decomposition. 

3.  Online  Inference  of  Parameters 

A  standard  way  to  estimate  model  parameters  in  parametric 
mixture  models  is  to  maximize  the  (observed)  log- 
likelihood  of  the  data. 


n  K 

^  7Tfc<KXi,  0fc)  (7) 

i= 1  k= 1 

Note  that  we  prefer  the  log-likelihood  over  the  likelihood,  as 
it  is  much  more  convenient  to  work  with  the  former  from  a 
mathematical  point  of  view.  The  maximum  likelihood 
method  then  proposes  to  estimate  the  parameters  of  the 
model  6by  9mv  =  argma xL(x;  9 ). 

As  we  saw  earlier,  complete  data  {(x1,z1),  (x2,z2), 
...,(xn,zn)}  are  composed  of  pairs  of  data  x  and  class 
information  z.  The  complete  log-likelihood  Lc(x,z ;  9)  is  the 
log-likelihood  calculated  from  the  complete  data: 

n  K 

^-11  t£°l°g  (7Tk0(Xi,  0k))  (8) 

i-1  k= 1 

Here,  we  have  defined  t  as  the  indicator  variable  of  the 
classes,  so  that  if  zt  =  k  for  a  data  sample  /,  then  tk^  =  1 
and  t\p  =  0 ,  Vy  =£  k. 

In  order  to  extend  MPPCA  to  the  online  setting,  we  develop 
hereafter  an  online  EM-based  algorithm  which  incorporates 
a  probabilistic  version  of  the  incremental  PCA  (Hall, 
Marshall,  &  Martin,  1998).  We  consider  here  a  setting 
where  data  samples  are  arriving  in  an  online  manner  and 


L(x;  0)  =  ^  log 


Before  updating  the  estimate  of  6 ,  it  is  necessary  to  compute 
the  expectation  of  the  complete  log-likelihood 
E(Lc(x,z;  9)  1 0^)  conditionally  to  the  current  estimate 

9in\ 

This  quantity  will  be  maximized  in  the  second  step  to  obtain 
the  new  estimate  9^n+1^  of  6.  As  with  all  mixture  models, 
the  computation  of  the  conditional  expectation  of  the 
complete  log-likelihood  reduces,  in  the  context  of  the 
MPPCA  model,  to  the  computation  of  the  probabilities 
t£n+1^  =  P(Z  =  k\X  =  x(n+1))  that  the  new  data  sample 
belongs  to  the  k-th  mixture  component  (Figure  3).  These 
probabilities  can  be  computed  as  follows: 


t(n+ 1)  _  ft*#  (xn+i;  0fc  ) 

T.l=l^l<p(xn+1>  ^(?1)) 

/ Ef=i  exp  (§(rk(n)(xn+1)  -  r,(n)(xn+1))) 


(9) 


where  the  classification  function  Tk  has  the  following  form: 


L(x)  =  \\nk  -  Pk(x)U\  +  A||x-Pfe(x)||2 


+ 


^  log(ak;)  +  (p  -  d)  log  (bk)  -  2  \og(nk ) 


(10) 


7  =  1 


with 


llx|lik  =  x  Akx 

Ak  =  Qk  A  Qk 

Pk(x)  =  QkQ'k(x-Hk)+^k- 
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y(n+l)n(n+l)  _  n(n+l)  *(n+l) 

V k  ~  V/c 

where  2l^n+1^  =  diag{Akl  ...Akp  }  and  this  for  k  = 
To  begin  with,  let  us  define: 


(13) 

1  ...K. 


si"*1’  _ 

where  g £n+1^  is  the  projection  of  the  data  sample  on  the 
subspace  defined  by  the  eigenvectors  and  h£n+1^  is  the 
residue  of  the  retro -projection  on  the  original  space.  With 
these  notations,  the  new  eigenvectors  Q (£L+1'}  correspond  to  a 
rotation  of  the  old  ones  plus  the  unit  residue  vector  h£n+1^: 


Figure  3  -  Geometric  interpretation  of  the  probability  that  a 
sample  belongs  to  a  given  class. 


3.2.  The  M-step 

Once  the  posterior  probabilities  t^+V}  have  been  computed, 
we  update  the  model  parameters  so  that  they  maximize 
E(Lc(x,Z)  In  order  to  derive  an  online  inference 

strategy  which  does  not  keep  all  past  data  samples,  it  is 
necessary  to  make  use  of  the  following  approximation: 


E(Lc(x,z;  6)\0^)  -  E[Lc(x,z;6)\6^n  x))  + 

'Yj  ffcn+1)  l0g  {nk<p(xn+l'l  0fcn)))  ( 
k= 1 

Then,  it  is  straightforward  to  show  that  the  update  formulas 
for  the  mixture  proportions  nk  and  the  component  means 
/ik ,  for  every  component  k  =  1 ...  K,  are: 


(n+l)  _  (n) 


7Tv 


ni 


+— (t, 

71  -1-1  V  J 


(n+l) 


-  71 


(n+l) 


where  n^1-1  =  nk 


k 

(n) 


n+l  v  k 

(»?W : ’  -  4”° 

(n+l) 


(n)\ 

k  )■> 

X„+l), 


+  4  andn  =  Sfc=1...K4 


(n) 


(12) 


We  then  want  to  estimate  the  parameters  Qk ,  and  bk ,  for 

k  =  1  ...K  and  j  =  1  ...d.  We  have  already  seen  that  the 
maximization  of  E(Lc(x,z;  0)|0^)  with  respect  to  these 
parameters  is  equivalent  to  the  eigen  decomposition  of  the 
empirical  covariance  matrix  5^  for  each  component 
k  =  1  ...K .  The  problem  that  we  seek  to  solve  can  be 
therefore  stated  as  follows:  having  already  calculated 
eigenvectors  Q ^  and  eigenvalues  A ^  from  the  n  first  data 
samples,  we  want  to  update  those  parameters  on  the  arrival 
of  a  (zzH-l)-th  data  sample.  In  particular,  on  the  arrival  of  the 
new  data  sample  xn+b  the  new  eigenproblem  that  we  need  to 
solve  is: 


An+ 1) 


.(n+l)  | 


h 


(n+D  _  J  ll  ^ (n+l)  ||  ’  lf  I.  - 

0,  otherwise. 


±  0 


(15) 


and  thus  the  new  eigenvectors  may  be  written: 


Q(n+l)  =  ^(n^n+D^n+l)  (16) 

where  is  a  rotation  matrix  of  size  (d+l)x(d+l).  Note 

that  Q ^  is  a  p  x  d  matrix,  since  we  have  discarded  the  p~d 
less  significant  eigenvalues.  The  new  covariance  matrix 

^(n+!)  for  ^  cjass  £  js  given  by; 


y(n+l)  ^ k  y  (n) 

Lk  ~  (n+i)  Lk 

nk 


An) 


+  ■ 


(«?"’) 


rxx 


(17) 


where  we  have  set  x  =  t£n+1)xn+1  —  l^k+V}  .  Then,  by 
substituting  equations  (16)  and  (17)  into  equation  (13)  we 
get2: 


[QF'h] 


n(n) 

nk  v(n)  , 
(n+l)  hk 
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(18) 
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The  above  problem  can  be  written  as: 
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<+1Xn+1) 


where  we  have  set  =  h^+1^'x.  The  solution  to  this 

new  eigenproblem  yields  the  rotation  matrix  /?£n+1^  and  the 
new  eigenvalues  2l^n+1^  directly.  Then,  the  new 
eigenvectors  can  be  obtained  using  equation  (16).  Note  that 


2  For  simplicity  we  omit  temporal  subscript  ( n  +  1)  for  vectors  hk  and  gk. 
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both  and  A^+1^  are  square  matrices  of  dimension 

d+ 1,  that  is,  we  only  need  to  solve  an  eigenproblem  of 
dimension  d+l  and  not  p.  The  update  formulas  for  the 
variance  parameters  akJ-  and  bk  are  then: 


(n+l)  _  A(n+ 1) 
ukj  ~  lvkj  > 


(ru) 


(20) 


4.  Comparisons  with  Online  EM  and  CEM 

We  compare  online  MPPCA  with  two  other  online 
algorithms,  online  EM  (Titterington,  1984)  and  online  CEM 
(Same,  Ambroise,  &  Govaert,  2007).  Note  that  these  latter 
have  not  been  designed  to  handle  high-dimensional  data. 
This  benchmark  was  done  on  simulated  data  because  we 
could  control  the  real  problem  dimension  which  is  not  the 
case  with  real  observations.  An  application  on  turbofan 
engine  measurements  is  given  in  the  next  section. 

For  this  experiment,  we  have  generated  a  dataset  of  n  - 
12000  data  samples  in  Rp  based  on  the  assumption  that  data 
live  in  low-dimensional  subspaces,  with  p  -  30  and  K-  3. 
Hereafter,  we  refer  to  this  dataset  as  D30  .  The  mixture 
proportions  are  7ii  =  0.4  and  7i2  =  ti3  =  0.3.  For  simplicity,  we 
have  considered  that  for  each  class,  the  variance  is  common 
across  all  dimensions,  that  is  akJ-  =  ak,  for  k  =  1 ...  K  and 
7  =  1 ...  d.  We  have  set  =  150,  a2  =  75,  a3  =  50,  b\  =  b2  = 

b3  =  5  and  pi  =  0,  p2  =  {0 _ 5 _ 0}  and  p3  =  {0...-5...0}, 

with  Pi,  p2>P3  £  Rp  -  We  have  set  the  intrinsic  dimension 
(dimension  of  the  subspaces)  at  d  =  2. 

We  also  simulate  a  second  dataset  of  lower  dimension  ( p  = 
10)  ,  generated  with  the  same  parameters  as  the  former.  We 
will  refer  to  this  new  dataset  as  D10. 

Our  goal  was  to  study  the  behavior  of  the  three  algorithms 
in  low  dimension  and  then  illustrate  the  capability  of  online 
MPPCA  to  cluster  efficiently  even  in  high  dimension. 


Figure  4  and  Figure  5  show  the  comparative  performance  of 
online  MPPCA  (black),  online  EM  (red)  and  online  CEM 
(blue)  for  the  datasets  D10  and  D30,  respectively. 

For  the  dataset  D10  it  is  clear,  both  from  the  clustering 
accuracy  and  the  MSE  that  online  MPPCA  converges  faster 
than  the  other  two  algorithms. 


Figure  4  -  Evolution  of  MSE  for  the  dataset  D10  versus  the 
number  of  data  samples  for  online  MPPCA  (black  solid), 
online  EM  (red  dashed)  and  online  CEM  (blue  dotted). 


No.  of  obs 

Figure  5  -  Evolution  of  MSE  for  the  dataset  D30  versus  the 
number  of  data  samples  for  online  MPPCA  (black  solid), 
online  EM  (red  dashed)  and  online  CEM  (blue  dotted). 

5.  Application  to  Engine  Health  Monitoring 


We  have  evaluated  the  three  algorithms  on  the  quality  of 
their  estimation  of  the  class  means  and  on  the  accuracy  of 
the  clustering  produced.  The  quality  of  the  estimation  of  the 
means  was  taken  to  be  the  square  of  the  distance  of  the 
estimated  means  to  the  true  ones,  averaged  over  all  K  =  3 
classes,  a  measure  known  as  the  Mean  Square  Error  (MSE) 
in  statistics 

MSEn =  _  (21) 

Online  MPPCA,  online  EM  and  online  CEM  were 
initialized  30  times  by  a  standard  MPPCA,  an  EM  and  a 
CEM,  respectively,  of  which  the  initialization  giving  the 
highest  BIC  value  was  kept. 


We  test  the  proposed  method  to  real  data  issued  from  the 
aircraft  engine  Health  Monitoring  domain.  The  data  were 
obtained  by  Snecma. 

Typically,  there  exists  different  phases  during  a  flight,  called 
flight  modes:  taking-off,  cruising,  landing  etc.  Each  test  is 
actually  a  sequence  of  alternating  stationary  and  non- 
stationary  phases  at  different  levels.  The  stationary  phases 
correspond  in  general  to  such  flight  modes,  while  the  non¬ 
stationary  ones  reflect  the  transition  between  two  such 
phases.  Nevertheless,  a  flight  mode  can  include  multiple 
stationary  phases,  that  is,  a  stationary  control  on  the  data  is 
not  enough  to  detect  the  flight  modes. 

Aircraft  engineers  can  identify  these  modes  by  looking  at 
the  data  but  this  can  be  extremely  time-consuming. 
Moreover,  due  to  the  high  dimensionality  of  data,  there  can 
be  relations  that  humans  cannot  perceive.  Note  that  by 
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knowing,  at  any  given  time,  in  which  flight  mode  the  engine 
currently  is,  tasks  like  anomaly  detection  can  be  performed 
much  more  reliably,  since  the  ’local’  context  of  the  data  is 
also  taken  into  account. 

The  experiment  below  (Figure  6)  involves  a  MPPCA  stage 
used  to  build  a  residual  vector  that  is  finally  classified  with  a 
self  organizing  map  (SOM).  The  score  represents  the 
distance  to  the  corresponding  class  center,  and  the  fault 


identification  is  obtained  as  the  map  cell  number. 

The  data  simulates  real  engine  normal  degradation  (usual 
wear)  to  be  detected  by  trend  monitoring  tools.  The  result 
appears  to  be  pertinent  for  operational  analysis  as  the  MRO 
operator  usually  waits  for  confirmation  before  any  customer 
notification. 
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6  -  Scoring  and  identification  of  trend  faults  using  a  self  organizing  map  after  MPPCA  normalization. 

Green  dots  are  the  true  detections  and  red  ones  the  false  alarms.  POD  stands  for  probability  of  detection  and  is 
given  as  a  point  to  point  count,  as  well  as  the  PFA  which  is  the  probability  of  false  alarm. 
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6.  Conclusion 

We  have  proposed  an  online  inference  algorithm  for  the 
MPPCA  model  which  relies  on  an  EM-based  procedure  and 
a  probabilistic  and  incremental  version  of  PCA.  The 
proposed  strategy  allows  to  incrementally  update  the 
estimates  of  the  MPPCA  parameters  at  the  arrival  of  a  new 
data  sample.  It  allows  also  providing  low-dimensional 
visualizations  of  the  data  based  on  sufficient  information. 
Model  selection  is  also  considered  in  the  online  setting 
through  parallel  computing.  Numerical  experiments  on 
simulated  and  real  data  have  shown  that  the  online  MPPCA 
algorithm  performs  better  in  high-dimensional  spaces 
compared  to  existing  online  EM-based  algorithms. 

Nomenclature 

ACARS  Aircraft  Communications  Addressing  and 

Reporting  System 

AIC  Akaike  Information  Criterion 

BIC  Bayesian  Information  Criterion 

DFDR  Digital  Flight  Data  Recorder 

EM  Expectation  Maximization 

LASSO  Least  Absolute  Shrinkage  and  Selection 

Operator 

MPPCA  Mixture  of  Probabilistic  PCA 

MRO  Maintenance  Repair  Overhaul 

MSE  Mean  Square  Error 

PCA  Principal  Component  Analysis 

SOM  Self  Organizing  Map 
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